Overview

Dataset statistics

Number of variables29
Number of observations4272
Missing cells73301
Missing cells (%)59.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory968.0 KiB
Average record size in memory232.0 B

Variable types

Numeric7
Categorical22

Alerts

repeat_instrument_1 has constant value ""Constant
repeat_instrument_2 has constant value ""Constant
repeat_instance_2 has constant value ""Constant
receptor_de_progesterona_quantificacao_%_2 is highly overall correlated with repeat_instance_1 and 5 other fieldsHigh correlation
receptorde_estrogenio_quantificacao_%_2 is highly overall correlated with repeat_instance_1 and 4 other fieldsHigh correlation
indice_h_receptorde_progesterona_1 is highly overall correlated with indice_h_receptorde_progesterona_2 and 3 other fieldsHigh correlation
indice_h_receptorde_progesterona_2 is highly overall correlated with indice_h_receptorde_progesterona_1 and 6 other fieldsHigh correlation
ki67_%_1 is highly overall correlated with ki67_%_2 and 3 other fieldsHigh correlation
ki67_%_2 is highly overall correlated with ki67_%_1 and 4 other fieldsHigh correlation
repeat_instance_1 is highly overall correlated with receptor_de_progesterona_quantificacao_%_2 and 21 other fieldsHigh correlation
diagnostico_primario_tipo_histologico_1 is highly overall correlated with indice_h_receptorde_progesterona_2 and 2 other fieldsHigh correlation
diagnostico_primario_tipo_histologico_2 is highly overall correlated with indice_h_receptorde_progesterona_1 and 3 other fieldsHigh correlation
grau_histologico_1 is highly overall correlated with repeat_instance_1 and 1 other fieldsHigh correlation
grau_histologico_2 is highly overall correlated with repeat_instance_1 and 2 other fieldsHigh correlation
subtipo_tumoral_1 is highly overall correlated with subtipo_tumoral_2 and 4 other fieldsHigh correlation
subtipo_tumoral_2 is highly overall correlated with receptor_de_progesterona_quantificacao_%_2 and 9 other fieldsHigh correlation
receptor_de_estrogenio_1 is highly overall correlated with receptor_de_progesterona_quantificacao_%_2 and 10 other fieldsHigh correlation
receptor_de_estrogenio_2 is highly overall correlated with receptor_de_progesterona_quantificacao_%_2 and 10 other fieldsHigh correlation
receptor_de_progesterona_1 is highly overall correlated with repeat_instance_1 and 4 other fieldsHigh correlation
receptor_de_progesterona_2 is highly overall correlated with receptor_de_progesterona_quantificacao_%_2 and 7 other fieldsHigh correlation
ki67_>14%_1 is highly overall correlated with ki67_%_1 and 3 other fieldsHigh correlation
ki67_>14%_2 is highly overall correlated with ki67_%_2 and 2 other fieldsHigh correlation
receptor_de_progesterona_quantificacao_%_1 is highly overall correlated with receptor_de_progesterona_quantificacao_%_2 and 6 other fieldsHigh correlation
receptorde_estrogenio_quantificacao_%_1 is highly overall correlated with receptorde_estrogenio_quantificacao_%_2 and 5 other fieldsHigh correlation
her2_por_ihc_1 is highly overall correlated with her2_por_fish_1 and 1 other fieldsHigh correlation
her2_por_ihc_2 is highly overall correlated with repeat_instance_1 and 1 other fieldsHigh correlation
her2_por_fish_1 is highly overall correlated with repeat_instance_1 and 2 other fieldsHigh correlation
her2_por_fish_2 is highly overall correlated with repeat_instance_1 and 3 other fieldsHigh correlation
repeat_instance_1 is highly imbalanced (99.7%)Imbalance
diagnostico_primario_tipo_histologico_1 is highly imbalanced (79.7%)Imbalance
diagnostico_primario_tipo_histologico_2 is highly imbalanced (84.9%)Imbalance
receptor_de_estrogenio_2 is highly imbalanced (52.3%)Imbalance
ki67_>14%_1 is highly imbalanced (65.9%)Imbalance
ki67_>14%_2 is highly imbalanced (58.7%)Imbalance
her2_por_fish_2 is highly imbalanced (52.8%)Imbalance
repeat_instrument_2 has 3773 (88.3%) missing valuesMissing
repeat_instance_2 has 3773 (88.3%) missing valuesMissing
diagnostico_primario_tipo_histologico_1 has 1607 (37.6%) missing valuesMissing
diagnostico_primario_tipo_histologico_2 has 3984 (93.3%) missing valuesMissing
grau_histologico_1 has 3078 (72.1%) missing valuesMissing
grau_histologico_2 has 4001 (93.7%) missing valuesMissing
subtipo_tumoral_2 has 3852 (90.2%) missing valuesMissing
receptor_de_estrogenio_1 has 451 (10.6%) missing valuesMissing
receptor_de_estrogenio_2 has 3849 (90.1%) missing valuesMissing
receptor_de_progesterona_1 has 449 (10.5%) missing valuesMissing
receptor_de_progesterona_2 has 3849 (90.1%) missing valuesMissing
ki67_>14%_1 has 748 (17.5%) missing valuesMissing
ki67_>14%_2 has 3796 (88.9%) missing valuesMissing
receptor_de_progesterona_quantificacao_%_1 has 2667 (62.4%) missing valuesMissing
receptor_de_progesterona_quantificacao_%_2 has 3988 (93.4%) missing valuesMissing
receptorde_estrogenio_quantificacao_%_1 has 2449 (57.3%) missing valuesMissing
receptorde_estrogenio_quantificacao_%_2 has 3932 (92.0%) missing valuesMissing
indice_h_receptorde_progesterona_1 has 3811 (89.2%) missing valuesMissing
indice_h_receptorde_progesterona_2 has 4141 (96.9%) missing valuesMissing
her2_por_ihc_1 has 53 (1.2%) missing valuesMissing
her2_por_ihc_2 has 3847 (90.1%) missing valuesMissing
her2_por_fish_1 has 2533 (59.3%) missing valuesMissing
her2_por_fish_2 has 3969 (92.9%) missing valuesMissing
ki67_%_1 has 899 (21.0%) missing valuesMissing
ki67_%_2 has 3800 (89.0%) missing valuesMissing
record_id has unique valuesUnique

Reproduction

Analysis started2023-02-28 14:17:50.057661
Analysis finished2023-02-28 14:18:32.335801
Duration42.28 seconds
Software versionydata-profiling vv4.0.0
Download configurationconfig.json

Variables

record_id
Real number (ℝ)

Distinct4272
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean48652.36
Minimum302
Maximum82240
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.5 KiB
2023-02-28T14:18:32.490672image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum302
5-th percentile13992.4
Q131013
median53394
Q365816.75
95-th percentile78668.25
Maximum82240
Range81938
Interquartile range (IQR)34803.75

Descriptive statistics

Standard deviation20659.52
Coefficient of variation (CV)0.4246355
Kurtosis-0.99374558
Mean48652.36
Median Absolute Deviation (MAD)16732
Skewness-0.29501895
Sum2.0784288 × 108
Variance4.2681575 × 108
MonotonicityStrictly increasing
2023-02-28T14:18:32.711512image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
302 1
 
< 0.1%
60912 1
 
< 0.1%
60757 1
 
< 0.1%
60774 1
 
< 0.1%
60777 1
 
< 0.1%
60799 1
 
< 0.1%
60815 1
 
< 0.1%
60825 1
 
< 0.1%
60826 1
 
< 0.1%
60840 1
 
< 0.1%
Other values (4262) 4262
99.8%
ValueCountFrequency (%)
302 1
< 0.1%
710 1
< 0.1%
752 1
< 0.1%
1367 1
< 0.1%
1589 1
< 0.1%
1705 1
< 0.1%
1843 1
< 0.1%
1873 1
< 0.1%
1898 1
< 0.1%
1960 1
< 0.1%
ValueCountFrequency (%)
82240 1
< 0.1%
82205 1
< 0.1%
82131 1
< 0.1%
82124 1
< 0.1%
82123 1
< 0.1%
82122 1
< 0.1%
82118 1
< 0.1%
82112 1
< 0.1%
82111 1
< 0.1%
82100 1
< 0.1%
Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size33.5 KiB
Dados Histopatologicos Mama
4272 

Length

Max length27
Median length27
Mean length27
Min length27

Characters and Unicode

Total characters115344
Distinct characters15
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowDados Histopatologicos Mama
2nd rowDados Histopatologicos Mama
3rd rowDados Histopatologicos Mama
4th rowDados Histopatologicos Mama
5th rowDados Histopatologicos Mama

Common Values

ValueCountFrequency (%)
Dados Histopatologicos Mama 4272
100.0%

Length

2023-02-28T14:18:32.905517image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:18:33.090925image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
dados 4272
33.3%
histopatologicos 4272
33.3%
mama 4272
33.3%

Most occurring characters

ValueCountFrequency (%)
o 21360
18.5%
a 17088
14.8%
s 12816
11.1%
8544
 
7.4%
i 8544
 
7.4%
t 8544
 
7.4%
D 4272
 
3.7%
d 4272
 
3.7%
H 4272
 
3.7%
p 4272
 
3.7%
Other values (5) 21360
18.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 93984
81.5%
Uppercase Letter 12816
 
11.1%
Space Separator 8544
 
7.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 21360
22.7%
a 17088
18.2%
s 12816
13.6%
i 8544
 
9.1%
t 8544
 
9.1%
d 4272
 
4.5%
p 4272
 
4.5%
l 4272
 
4.5%
g 4272
 
4.5%
c 4272
 
4.5%
Uppercase Letter
ValueCountFrequency (%)
D 4272
33.3%
H 4272
33.3%
M 4272
33.3%
Space Separator
ValueCountFrequency (%)
8544
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 106800
92.6%
Common 8544
 
7.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 21360
20.0%
a 17088
16.0%
s 12816
12.0%
i 8544
 
8.0%
t 8544
 
8.0%
D 4272
 
4.0%
d 4272
 
4.0%
H 4272
 
4.0%
p 4272
 
4.0%
l 4272
 
4.0%
Other values (4) 17088
16.0%
Common
ValueCountFrequency (%)
8544
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 115344
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 21360
18.5%
a 17088
14.8%
s 12816
11.1%
8544
 
7.4%
i 8544
 
7.4%
t 8544
 
7.4%
D 4272
 
3.7%
d 4272
 
3.7%
H 4272
 
3.7%
p 4272
 
3.7%
Other values (5) 21360
18.5%

repeat_instrument_2
Categorical

CONSTANT  MISSING 

Distinct1
Distinct (%)0.2%
Missing3773
Missing (%)88.3%
Memory size33.5 KiB
Dados Histopatologicos Mama
499 

Length

Max length27
Median length27
Mean length27
Min length27

Characters and Unicode

Total characters13473
Distinct characters15
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowDados Histopatologicos Mama
2nd rowDados Histopatologicos Mama
3rd rowDados Histopatologicos Mama
4th rowDados Histopatologicos Mama
5th rowDados Histopatologicos Mama

Common Values

ValueCountFrequency (%)
Dados Histopatologicos Mama 499
 
11.7%
(Missing) 3773
88.3%

Length

2023-02-28T14:18:33.243645image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:18:33.412707image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
dados 499
33.3%
histopatologicos 499
33.3%
mama 499
33.3%

Most occurring characters

ValueCountFrequency (%)
o 2495
18.5%
a 1996
14.8%
s 1497
11.1%
998
 
7.4%
i 998
 
7.4%
t 998
 
7.4%
D 499
 
3.7%
d 499
 
3.7%
H 499
 
3.7%
p 499
 
3.7%
Other values (5) 2495
18.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 10978
81.5%
Uppercase Letter 1497
 
11.1%
Space Separator 998
 
7.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 2495
22.7%
a 1996
18.2%
s 1497
13.6%
i 998
 
9.1%
t 998
 
9.1%
d 499
 
4.5%
p 499
 
4.5%
l 499
 
4.5%
g 499
 
4.5%
c 499
 
4.5%
Uppercase Letter
ValueCountFrequency (%)
D 499
33.3%
H 499
33.3%
M 499
33.3%
Space Separator
ValueCountFrequency (%)
998
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 12475
92.6%
Common 998
 
7.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 2495
20.0%
a 1996
16.0%
s 1497
12.0%
i 998
 
8.0%
t 998
 
8.0%
D 499
 
4.0%
d 499
 
4.0%
H 499
 
4.0%
p 499
 
4.0%
l 499
 
4.0%
Other values (4) 1996
16.0%
Common
ValueCountFrequency (%)
998
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 13473
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 2495
18.5%
a 1996
14.8%
s 1497
11.1%
998
 
7.4%
i 998
 
7.4%
t 998
 
7.4%
D 499
 
3.7%
d 499
 
3.7%
H 499
 
3.7%
p 499
 
3.7%
Other values (5) 2495
18.5%

repeat_instance_1
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size33.5 KiB
1.0
4271 
2.0
 
1

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters12816
Distinct characters4
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0 4271
> 99.9%
2.0 1
 
< 0.1%

Length

2023-02-28T14:18:33.563528image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:18:33.760189image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
1.0 4271
> 99.9%
2.0 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
. 4272
33.3%
0 4272
33.3%
1 4271
33.3%
2 1
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 8544
66.7%
Other Punctuation 4272
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 4272
50.0%
1 4271
50.0%
2 1
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
. 4272
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 12816
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
. 4272
33.3%
0 4272
33.3%
1 4271
33.3%
2 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 12816
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 4272
33.3%
0 4272
33.3%
1 4271
33.3%
2 1
 
< 0.1%

repeat_instance_2
Categorical

CONSTANT  MISSING 

Distinct1
Distinct (%)0.2%
Missing3773
Missing (%)88.3%
Memory size33.5 KiB
2.0
499 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters1497
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2.0
2nd row2.0
3rd row2.0
4th row2.0
5th row2.0

Common Values

ValueCountFrequency (%)
2.0 499
 
11.7%
(Missing) 3773
88.3%

Length

2023-02-28T14:18:33.929799image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:18:34.109951image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
2.0 499
100.0%

Most occurring characters

ValueCountFrequency (%)
2 499
33.3%
. 499
33.3%
0 499
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 998
66.7%
Other Punctuation 499
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 499
50.0%
0 499
50.0%
Other Punctuation
ValueCountFrequency (%)
. 499
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1497
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 499
33.3%
. 499
33.3%
0 499
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1497
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 499
33.3%
. 499
33.3%
0 499
33.3%

diagnostico_primario_tipo_histologico_1
Categorical

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct18
Distinct (%)0.7%
Missing1607
Missing (%)37.6%
Memory size33.5 KiB
NÃO-ESPECIAL - Carcinoma de mama ductal invasivo (CDI)/SOE
2376 
Carcinoma de mama lobular invasivo
 
80
outros
 
52
Carcinoma de mama metaplasico
 
34
Carcinoma de mama mucinoso
 
30
Other values (13)
 
93

Length

Max length59
Median length58
Mean length54.687054
Min length6

Characters and Unicode

Total characters145741
Distinct characters45
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowNÃO-ESPECIAL - Carcinoma de mama ductal invasivo (CDI)/SOE
2nd rowNÃO-ESPECIAL - Carcinoma de mama ductal invasivo (CDI)/SOE
3rd rowNÃO-ESPECIAL - Carcinoma de mama ductal invasivo (CDI)/SOE
4th rowNÃO-ESPECIAL - Carcinoma de mama ductal invasivo (CDI)/SOE
5th rowNÃO-ESPECIAL - Carcinoma de mama ductal invasivo (CDI)/SOE

Common Values

ValueCountFrequency (%)
NÃO-ESPECIAL - Carcinoma de mama ductal invasivo (CDI)/SOE 2376
55.6%
Carcinoma de mama lobular invasivo 80
 
1.9%
outros 52
 
1.2%
Carcinoma de mama metaplasico 34
 
0.8%
Carcinoma de mama mucinoso 30
 
0.7%
Carcinoma de mama papilifero 21
 
0.5%
Carcinoma de mama medular 16
 
0.4%
Carcinoma de mama micropapilar 11
 
0.3%
Carcinoma de mama misto (ductal e lobular) invasivo 8
 
0.2%
Carcinoma mamário invasivo multifocal 7
 
0.2%
Other values (8) 30
 
0.7%
(Missing) 1607
37.6%

Length

2023-02-28T14:18:34.256004image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
carcinoma 2610
12.9%
de 2595
12.9%
mama 2595
12.9%
invasivo 2482
12.3%
ductal 2390
11.9%
não-especial 2376
11.8%
cdi)/soe 2376
11.8%
2376
11.8%
lobular 96
 
0.5%
outros 52
 
0.3%
Other values (23) 214
 
1.1%

Most occurring characters

ValueCountFrequency (%)
17497
 
12.0%
a 15523
 
10.7%
m 7938
 
5.4%
i 7767
 
5.3%
C 7372
 
5.1%
E 7129
 
4.9%
o 5476
 
3.8%
n 5128
 
3.5%
c 5098
 
3.5%
d 5015
 
3.4%
Other values (35) 61798
42.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 73137
50.2%
Uppercase Letter 43199
29.6%
Space Separator 17497
 
12.0%
Dash Punctuation 4752
 
3.3%
Open Punctuation 2390
 
1.6%
Close Punctuation 2390
 
1.6%
Other Punctuation 2376
 
1.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 15523
21.2%
m 7938
10.9%
i 7767
10.6%
o 5476
 
7.5%
n 5128
 
7.0%
c 5098
 
7.0%
d 5015
 
6.9%
v 4954
 
6.8%
r 2837
 
3.9%
e 2707
 
3.7%
Other values (11) 10694
14.6%
Uppercase Letter
ValueCountFrequency (%)
C 7372
17.1%
E 7129
16.5%
O 4783
11.1%
I 4782
11.1%
S 4758
11.0%
A 2403
 
5.6%
N 2391
 
5.5%
P 2387
 
5.5%
D 2382
 
5.5%
L 2378
 
5.5%
Other values (9) 2434
 
5.6%
Space Separator
ValueCountFrequency (%)
17497
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 4752
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2390
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2390
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 2376
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 116336
79.8%
Common 29405
 
20.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 15523
 
13.3%
m 7938
 
6.8%
i 7767
 
6.7%
C 7372
 
6.3%
E 7129
 
6.1%
o 5476
 
4.7%
n 5128
 
4.4%
c 5098
 
4.4%
d 5015
 
4.3%
v 4954
 
4.3%
Other values (30) 44936
38.6%
Common
ValueCountFrequency (%)
17497
59.5%
- 4752
 
16.2%
( 2390
 
8.1%
) 2390
 
8.1%
/ 2376
 
8.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 143339
98.4%
None 2402
 
1.6%

Most frequent character per block

ASCII
ValueCountFrequency (%)
17497
 
12.2%
a 15523
 
10.8%
m 7938
 
5.5%
i 7767
 
5.4%
C 7372
 
5.1%
E 7129
 
5.0%
o 5476
 
3.8%
n 5128
 
3.6%
c 5098
 
3.6%
d 5015
 
3.5%
Other values (29) 59396
41.4%
None
ValueCountFrequency (%)
à 2376
98.9%
á 7
 
0.3%
í 6
 
0.2%
Á 5
 
0.2%
Ó 5
 
0.2%
ó 3
 
0.1%

diagnostico_primario_tipo_histologico_2
Categorical

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct6
Distinct (%)2.1%
Missing3984
Missing (%)93.3%
Memory size33.5 KiB
NÃO-ESPECIAL - Carcinoma de mama ductal invasivo (CDI)/SOE
273 
outros
 
7
Carcinoma de mama metaplasico
 
5
Carcinoma de mama micropapilar
 
1
Carcinoma de mama cistico adenoide
 
1

Length

Max length58
Median length58
Mean length55.96875
Min length6

Characters and Unicode

Total characters16119
Distinct characters32
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)1.0%

Sample

1st rowNÃO-ESPECIAL - Carcinoma de mama ductal invasivo (CDI)/SOE
2nd rowNÃO-ESPECIAL - Carcinoma de mama ductal invasivo (CDI)/SOE
3rd rowNÃO-ESPECIAL - Carcinoma de mama ductal invasivo (CDI)/SOE
4th rowNÃO-ESPECIAL - Carcinoma de mama ductal invasivo (CDI)/SOE
5th rowNÃO-ESPECIAL - Carcinoma de mama ductal invasivo (CDI)/SOE

Common Values

ValueCountFrequency (%)
NÃO-ESPECIAL - Carcinoma de mama ductal invasivo (CDI)/SOE 273
 
6.4%
outros 7
 
0.2%
Carcinoma de mama metaplasico 5
 
0.1%
Carcinoma de mama micropapilar 1
 
< 0.1%
Carcinoma de mama cistico adenoide 1
 
< 0.1%
Carcinoma de mama lobular invasivo 1
 
< 0.1%
(Missing) 3984
93.3%

Length

2023-02-28T14:18:34.451286image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:18:34.662695image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
carcinoma 281
12.6%
de 281
12.6%
mama 281
12.6%
invasivo 274
12.3%
não-especial 273
12.3%
273
12.3%
ductal 273
12.3%
cdi)/soe 273
12.3%
outros 7
 
0.3%
metaplasico 5
 
0.2%
Other values (4) 4
 
0.2%

Most occurring characters

ValueCountFrequency (%)
1937
 
12.0%
a 1685
 
10.5%
m 849
 
5.3%
i 839
 
5.2%
C 827
 
5.1%
E 819
 
5.1%
o 578
 
3.6%
c 562
 
3.5%
d 556
 
3.4%
n 556
 
3.4%
Other values (22) 6911
42.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 7895
49.0%
Uppercase Letter 4922
30.5%
Space Separator 1937
 
12.0%
Dash Punctuation 546
 
3.4%
Close Punctuation 273
 
1.7%
Other Punctuation 273
 
1.7%
Open Punctuation 273
 
1.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 1685
21.3%
m 849
10.8%
i 839
10.6%
o 578
 
7.3%
c 562
 
7.1%
d 556
 
7.0%
n 556
 
7.0%
v 548
 
6.9%
r 291
 
3.7%
e 288
 
3.6%
Other values (6) 1143
14.5%
Uppercase Letter
ValueCountFrequency (%)
C 827
16.8%
E 819
16.6%
I 546
11.1%
S 546
11.1%
O 546
11.1%
D 273
 
5.5%
N 273
 
5.5%
à 273
 
5.5%
L 273
 
5.5%
A 273
 
5.5%
Space Separator
ValueCountFrequency (%)
1937
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 546
100.0%
Close Punctuation
ValueCountFrequency (%)
) 273
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 273
100.0%
Open Punctuation
ValueCountFrequency (%)
( 273
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 12817
79.5%
Common 3302
 
20.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 1685
 
13.1%
m 849
 
6.6%
i 839
 
6.5%
C 827
 
6.5%
E 819
 
6.4%
o 578
 
4.5%
c 562
 
4.4%
d 556
 
4.3%
n 556
 
4.3%
v 548
 
4.3%
Other values (17) 4998
39.0%
Common
ValueCountFrequency (%)
1937
58.7%
- 546
 
16.5%
) 273
 
8.3%
/ 273
 
8.3%
( 273
 
8.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 15846
98.3%
None 273
 
1.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1937
 
12.2%
a 1685
 
10.6%
m 849
 
5.4%
i 839
 
5.3%
C 827
 
5.2%
E 819
 
5.2%
o 578
 
3.6%
c 562
 
3.5%
d 556
 
3.5%
n 556
 
3.5%
Other values (21) 6638
41.9%
None
ValueCountFrequency (%)
à 273
100.0%

grau_histologico_1
Categorical

HIGH CORRELATION  MISSING 

Distinct3
Distinct (%)0.3%
Missing3078
Missing (%)72.1%
Memory size33.5 KiB
2.0
591 
3.0
482 
1.0
121 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters3582
Distinct characters5
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3.0
2nd row3.0
3rd row3.0
4th row3.0
5th row2.0

Common Values

ValueCountFrequency (%)
2.0 591
 
13.8%
3.0 482
 
11.3%
1.0 121
 
2.8%
(Missing) 3078
72.1%

Length

2023-02-28T14:18:35.240229image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:18:35.436804image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
2.0 591
49.5%
3.0 482
40.4%
1.0 121
 
10.1%

Most occurring characters

ValueCountFrequency (%)
. 1194
33.3%
0 1194
33.3%
2 591
16.5%
3 482
13.5%
1 121
 
3.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2388
66.7%
Other Punctuation 1194
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1194
50.0%
2 591
24.7%
3 482
20.2%
1 121
 
5.1%
Other Punctuation
ValueCountFrequency (%)
. 1194
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 3582
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
. 1194
33.3%
0 1194
33.3%
2 591
16.5%
3 482
13.5%
1 121
 
3.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3582
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 1194
33.3%
0 1194
33.3%
2 591
16.5%
3 482
13.5%
1 121
 
3.4%

grau_histologico_2
Categorical

HIGH CORRELATION  MISSING 

Distinct3
Distinct (%)1.1%
Missing4001
Missing (%)93.7%
Memory size33.5 KiB
2.0
151 
3.0
68 
1.0
52 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters813
Distinct characters5
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3.0
2nd row3.0
3rd row2.0
4th row2.0
5th row2.0

Common Values

ValueCountFrequency (%)
2.0 151
 
3.5%
3.0 68
 
1.6%
1.0 52
 
1.2%
(Missing) 4001
93.7%

Length

2023-02-28T14:18:35.597364image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:18:35.779395image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
2.0 151
55.7%
3.0 68
25.1%
1.0 52
 
19.2%

Most occurring characters

ValueCountFrequency (%)
. 271
33.3%
0 271
33.3%
2 151
18.6%
3 68
 
8.4%
1 52
 
6.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 542
66.7%
Other Punctuation 271
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 271
50.0%
2 151
27.9%
3 68
 
12.5%
1 52
 
9.6%
Other Punctuation
ValueCountFrequency (%)
. 271
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 813
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
. 271
33.3%
0 271
33.3%
2 151
18.6%
3 68
 
8.4%
1 52
 
6.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 813
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 271
33.3%
0 271
33.3%
2 151
18.6%
3 68
 
8.4%
1 52
 
6.4%
Distinct5
Distinct (%)0.1%
Missing2
Missing (%)< 0.1%
Memory size33.5 KiB
2.0
1434 
4.0
1019 
5.0
664 
3.0
658 
1.0
495 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters12810
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row5.0
2nd row2.0
3rd row3.0
4th row2.0
5th row5.0

Common Values

ValueCountFrequency (%)
2.0 1434
33.6%
4.0 1019
23.9%
5.0 664
15.5%
3.0 658
15.4%
1.0 495
 
11.6%
(Missing) 2
 
< 0.1%

Length

2023-02-28T14:18:35.940164image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:18:36.144813image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
2.0 1434
33.6%
4.0 1019
23.9%
5.0 664
15.6%
3.0 658
15.4%
1.0 495
 
11.6%

Most occurring characters

ValueCountFrequency (%)
. 4270
33.3%
0 4270
33.3%
2 1434
 
11.2%
4 1019
 
8.0%
5 664
 
5.2%
3 658
 
5.1%
1 495
 
3.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 8540
66.7%
Other Punctuation 4270
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 4270
50.0%
2 1434
 
16.8%
4 1019
 
11.9%
5 664
 
7.8%
3 658
 
7.7%
1 495
 
5.8%
Other Punctuation
ValueCountFrequency (%)
. 4270
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 12810
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
. 4270
33.3%
0 4270
33.3%
2 1434
 
11.2%
4 1019
 
8.0%
5 664
 
5.2%
3 658
 
5.1%
1 495
 
3.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 12810
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 4270
33.3%
0 4270
33.3%
2 1434
 
11.2%
4 1019
 
8.0%
5 664
 
5.2%
3 658
 
5.1%
1 495
 
3.9%

subtipo_tumoral_2
Categorical

HIGH CORRELATION  MISSING 

Distinct5
Distinct (%)1.2%
Missing3852
Missing (%)90.2%
Memory size33.5 KiB
2.0
198 
4.0
76 
1.0
66 
5.0
65 
3.0
 
15

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters1260
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row5.0
2nd row5.0
3rd row5.0
4th row2.0
5th row2.0

Common Values

ValueCountFrequency (%)
2.0 198
 
4.6%
4.0 76
 
1.8%
1.0 66
 
1.5%
5.0 65
 
1.5%
3.0 15
 
0.4%
(Missing) 3852
90.2%

Length

2023-02-28T14:18:36.309391image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:18:36.503345image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
2.0 198
47.1%
4.0 76
 
18.1%
1.0 66
 
15.7%
5.0 65
 
15.5%
3.0 15
 
3.6%

Most occurring characters

ValueCountFrequency (%)
. 420
33.3%
0 420
33.3%
2 198
15.7%
4 76
 
6.0%
1 66
 
5.2%
5 65
 
5.2%
3 15
 
1.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 840
66.7%
Other Punctuation 420
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 420
50.0%
2 198
23.6%
4 76
 
9.0%
1 66
 
7.9%
5 65
 
7.7%
3 15
 
1.8%
Other Punctuation
ValueCountFrequency (%)
. 420
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1260
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
. 420
33.3%
0 420
33.3%
2 198
15.7%
4 76
 
6.0%
1 66
 
5.2%
5 65
 
5.2%
3 15
 
1.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1260
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 420
33.3%
0 420
33.3%
2 198
15.7%
4 76
 
6.0%
1 66
 
5.2%
5 65
 
5.2%
3 15
 
1.2%

receptor_de_estrogenio_1
Categorical

HIGH CORRELATION  MISSING 

Distinct2
Distinct (%)0.1%
Missing451
Missing (%)10.6%
Memory size33.5 KiB
positivo
2558 
negativo
1263 

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters30568
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowpositivo
2nd rowpositivo
3rd rowpositivo
4th rowpositivo
5th rownegativo

Common Values

ValueCountFrequency (%)
positivo 2558
59.9%
negativo 1263
29.6%
(Missing) 451
 
10.6%

Length

2023-02-28T14:18:36.670531image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:18:36.991648image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
positivo 2558
66.9%
negativo 1263
33.1%

Most occurring characters

ValueCountFrequency (%)
o 6379
20.9%
i 6379
20.9%
t 3821
12.5%
v 3821
12.5%
p 2558
8.4%
s 2558
8.4%
n 1263
 
4.1%
e 1263
 
4.1%
g 1263
 
4.1%
a 1263
 
4.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 30568
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 6379
20.9%
i 6379
20.9%
t 3821
12.5%
v 3821
12.5%
p 2558
8.4%
s 2558
8.4%
n 1263
 
4.1%
e 1263
 
4.1%
g 1263
 
4.1%
a 1263
 
4.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 30568
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 6379
20.9%
i 6379
20.9%
t 3821
12.5%
v 3821
12.5%
p 2558
8.4%
s 2558
8.4%
n 1263
 
4.1%
e 1263
 
4.1%
g 1263
 
4.1%
a 1263
 
4.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 30568
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 6379
20.9%
i 6379
20.9%
t 3821
12.5%
v 3821
12.5%
p 2558
8.4%
s 2558
8.4%
n 1263
 
4.1%
e 1263
 
4.1%
g 1263
 
4.1%
a 1263
 
4.1%

receptor_de_estrogenio_2
Categorical

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct3
Distinct (%)0.7%
Missing3849
Missing (%)90.1%
Memory size33.5 KiB
positivo
335 
negativo
87 
não realizado
 
1

Length

Max length13
Median length8
Mean length8.0118203
Min length8

Characters and Unicode

Total characters3389
Distinct characters16
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.2%

Sample

1st rowpositivo
2nd rowpositivo
3rd rowpositivo
4th rowpositivo
5th rowpositivo

Common Values

ValueCountFrequency (%)
positivo 335
 
7.8%
negativo 87
 
2.0%
não realizado 1
 
< 0.1%
(Missing) 3849
90.1%

Length

2023-02-28T14:18:37.242546image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:18:37.574507image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
positivo 335
79.0%
negativo 87
 
20.5%
não 1
 
0.2%
realizado 1
 
0.2%

Most occurring characters

ValueCountFrequency (%)
o 759
22.4%
i 758
22.4%
t 422
12.5%
v 422
12.5%
p 335
9.9%
s 335
9.9%
a 89
 
2.6%
n 88
 
2.6%
e 88
 
2.6%
g 87
 
2.6%
Other values (6) 6
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3388
> 99.9%
Space Separator 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 759
22.4%
i 758
22.4%
t 422
12.5%
v 422
12.5%
p 335
9.9%
s 335
9.9%
a 89
 
2.6%
n 88
 
2.6%
e 88
 
2.6%
g 87
 
2.6%
Other values (5) 5
 
0.1%
Space Separator
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3388
> 99.9%
Common 1
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 759
22.4%
i 758
22.4%
t 422
12.5%
v 422
12.5%
p 335
9.9%
s 335
9.9%
a 89
 
2.6%
n 88
 
2.6%
e 88
 
2.6%
g 87
 
2.6%
Other values (5) 5
 
0.1%
Common
ValueCountFrequency (%)
1
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3388
> 99.9%
None 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 759
22.4%
i 758
22.4%
t 422
12.5%
v 422
12.5%
p 335
9.9%
s 335
9.9%
a 89
 
2.6%
n 88
 
2.6%
e 88
 
2.6%
g 87
 
2.6%
Other values (5) 5
 
0.1%
None
ValueCountFrequency (%)
ã 1
100.0%

receptor_de_progesterona_1
Categorical

HIGH CORRELATION  MISSING 

Distinct3
Distinct (%)0.1%
Missing449
Missing (%)10.5%
Memory size33.5 KiB
positivo
2223 
negativo
1587 
inconclusivo
 
13

Length

Max length12
Median length8
Mean length8.0136019
Min length8

Characters and Unicode

Total characters30636
Distinct characters13
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowpositivo
2nd rownegativo
3rd rowpositivo
4th rowpositivo
5th rownegativo

Common Values

ValueCountFrequency (%)
positivo 2223
52.0%
negativo 1587
37.1%
inconclusivo 13
 
0.3%
(Missing) 449
 
10.5%

Length

2023-02-28T14:18:37.775469image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:18:38.089524image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
positivo 2223
58.1%
negativo 1587
41.5%
inconclusivo 13
 
0.3%

Most occurring characters

ValueCountFrequency (%)
o 6059
19.8%
i 6059
19.8%
v 3823
12.5%
t 3810
12.4%
s 2236
 
7.3%
p 2223
 
7.3%
n 1613
 
5.3%
e 1587
 
5.2%
g 1587
 
5.2%
a 1587
 
5.2%
Other values (3) 52
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 30636
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 6059
19.8%
i 6059
19.8%
v 3823
12.5%
t 3810
12.4%
s 2236
 
7.3%
p 2223
 
7.3%
n 1613
 
5.3%
e 1587
 
5.2%
g 1587
 
5.2%
a 1587
 
5.2%
Other values (3) 52
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
Latin 30636
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 6059
19.8%
i 6059
19.8%
v 3823
12.5%
t 3810
12.4%
s 2236
 
7.3%
p 2223
 
7.3%
n 1613
 
5.3%
e 1587
 
5.2%
g 1587
 
5.2%
a 1587
 
5.2%
Other values (3) 52
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 30636
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 6059
19.8%
i 6059
19.8%
v 3823
12.5%
t 3810
12.4%
s 2236
 
7.3%
p 2223
 
7.3%
n 1613
 
5.3%
e 1587
 
5.2%
g 1587
 
5.2%
a 1587
 
5.2%
Other values (3) 52
 
0.2%

receptor_de_progesterona_2
Categorical

HIGH CORRELATION  MISSING 

Distinct4
Distinct (%)0.9%
Missing3849
Missing (%)90.1%
Memory size33.5 KiB
positivo
276 
negativo
141 
inconclusivo
 
5
não realizado
 
1

Length

Max length13
Median length8
Mean length8.0591017
Min length8

Characters and Unicode

Total characters3409
Distinct characters18
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.2%

Sample

1st rownegativo
2nd rowpositivo
3rd rowpositivo
4th rowpositivo
5th rowpositivo

Common Values

ValueCountFrequency (%)
positivo 276
 
6.5%
negativo 141
 
3.3%
inconclusivo 5
 
0.1%
não realizado 1
 
< 0.1%
(Missing) 3849
90.1%

Length

2023-02-28T14:18:38.272958image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:18:38.611622image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
positivo 276
65.1%
negativo 141
33.3%
inconclusivo 5
 
1.2%
não 1
 
0.2%
realizado 1
 
0.2%

Most occurring characters

ValueCountFrequency (%)
o 705
20.7%
i 704
20.7%
v 422
12.4%
t 417
12.2%
s 281
 
8.2%
p 276
 
8.1%
n 152
 
4.5%
a 143
 
4.2%
e 142
 
4.2%
g 141
 
4.1%
Other values (8) 26
 
0.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3408
> 99.9%
Space Separator 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 705
20.7%
i 704
20.7%
v 422
12.4%
t 417
12.2%
s 281
 
8.2%
p 276
 
8.1%
n 152
 
4.5%
a 143
 
4.2%
e 142
 
4.2%
g 141
 
4.1%
Other values (7) 25
 
0.7%
Space Separator
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3408
> 99.9%
Common 1
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 705
20.7%
i 704
20.7%
v 422
12.4%
t 417
12.2%
s 281
 
8.2%
p 276
 
8.1%
n 152
 
4.5%
a 143
 
4.2%
e 142
 
4.2%
g 141
 
4.1%
Other values (7) 25
 
0.7%
Common
ValueCountFrequency (%)
1
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3408
> 99.9%
None 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 705
20.7%
i 704
20.7%
v 422
12.4%
t 417
12.2%
s 281
 
8.2%
p 276
 
8.1%
n 152
 
4.5%
a 143
 
4.2%
e 142
 
4.2%
g 141
 
4.1%
Other values (7) 25
 
0.7%
None
ValueCountFrequency (%)
ã 1
100.0%

ki67_>14%_1
Categorical

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct4
Distinct (%)0.1%
Missing748
Missing (%)17.5%
Memory size33.5 KiB
positivo
2939 
negativo
569 
não realizado
 
10
inconclusivo
 
6

Length

Max length13
Median length8
Mean length8.0209989
Min length8

Characters and Unicode

Total characters28266
Distinct characters18
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowpositivo
2nd rowpositivo
3rd rowpositivo
4th rownegativo
5th rowpositivo

Common Values

ValueCountFrequency (%)
positivo 2939
68.8%
negativo 569
 
13.3%
não realizado 10
 
0.2%
inconclusivo 6
 
0.1%
(Missing) 748
 
17.5%

Length

2023-02-28T14:18:38.873445image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:18:39.194655image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
positivo 2939
83.2%
negativo 569
 
16.1%
não 10
 
0.3%
realizado 10
 
0.3%
inconclusivo 6
 
0.2%

Most occurring characters

ValueCountFrequency (%)
o 6479
22.9%
i 6469
22.9%
v 3514
12.4%
t 3508
12.4%
s 2945
10.4%
p 2939
10.4%
n 591
 
2.1%
a 589
 
2.1%
e 579
 
2.0%
g 569
 
2.0%
Other values (8) 84
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 28256
> 99.9%
Space Separator 10
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 6479
22.9%
i 6469
22.9%
v 3514
12.4%
t 3508
12.4%
s 2945
10.4%
p 2939
10.4%
n 591
 
2.1%
a 589
 
2.1%
e 579
 
2.0%
g 569
 
2.0%
Other values (7) 74
 
0.3%
Space Separator
ValueCountFrequency (%)
10
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 28256
> 99.9%
Common 10
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 6479
22.9%
i 6469
22.9%
v 3514
12.4%
t 3508
12.4%
s 2945
10.4%
p 2939
10.4%
n 591
 
2.1%
a 589
 
2.1%
e 579
 
2.0%
g 569
 
2.0%
Other values (7) 74
 
0.3%
Common
ValueCountFrequency (%)
10
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 28256
> 99.9%
None 10
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 6479
22.9%
i 6469
22.9%
v 3514
12.4%
t 3508
12.4%
s 2945
10.4%
p 2939
10.4%
n 591
 
2.1%
a 589
 
2.1%
e 579
 
2.0%
g 569
 
2.0%
Other values (7) 74
 
0.3%
None
ValueCountFrequency (%)
ã 10
100.0%

ki67_>14%_2
Categorical

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct3
Distinct (%)0.6%
Missing3796
Missing (%)88.9%
Memory size33.5 KiB
positivo
399 
negativo
76 
inconclusivo
 
1

Length

Max length12
Median length8
Mean length8.0084034
Min length8

Characters and Unicode

Total characters3812
Distinct characters13
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.2%

Sample

1st rowpositivo
2nd rowpositivo
3rd rowpositivo
4th rowpositivo
5th rowpositivo

Common Values

ValueCountFrequency (%)
positivo 399
 
9.3%
negativo 76
 
1.8%
inconclusivo 1
 
< 0.1%
(Missing) 3796
88.9%

Length

2023-02-28T14:18:39.405641image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:18:39.720407image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
positivo 399
83.8%
negativo 76
 
16.0%
inconclusivo 1
 
0.2%

Most occurring characters

ValueCountFrequency (%)
o 876
23.0%
i 876
23.0%
v 476
12.5%
t 475
12.5%
s 400
10.5%
p 399
10.5%
n 78
 
2.0%
e 76
 
2.0%
g 76
 
2.0%
a 76
 
2.0%
Other values (3) 4
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3812
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 876
23.0%
i 876
23.0%
v 476
12.5%
t 475
12.5%
s 400
10.5%
p 399
10.5%
n 78
 
2.0%
e 76
 
2.0%
g 76
 
2.0%
a 76
 
2.0%
Other values (3) 4
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 3812
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 876
23.0%
i 876
23.0%
v 476
12.5%
t 475
12.5%
s 400
10.5%
p 399
10.5%
n 78
 
2.0%
e 76
 
2.0%
g 76
 
2.0%
a 76
 
2.0%
Other values (3) 4
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3812
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 876
23.0%
i 876
23.0%
v 476
12.5%
t 475
12.5%
s 400
10.5%
p 399
10.5%
n 78
 
2.0%
e 76
 
2.0%
g 76
 
2.0%
a 76
 
2.0%
Other values (3) 4
 
0.1%

receptor_de_progesterona_quantificacao_%_1
Categorical

HIGH CORRELATION  MISSING 

Distinct41
Distinct (%)2.6%
Missing2667
Missing (%)62.4%
Memory size33.5 KiB
90
244 
100
225 
80
160 
70
98 
40
95 
Other values (36)
783 

Length

Max length12
Median length2
Mean length2.0866044
Min length1

Characters and Unicode

Total characters3349
Distinct characters25
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique11 ?
Unique (%)0.7%

Sample

1st row5
2nd row90
3rd row40
4th row50
5th row10

Common Values

ValueCountFrequency (%)
90 244
 
5.7%
100 225
 
5.3%
80 160
 
3.7%
70 98
 
2.3%
40 95
 
2.2%
20 85
 
2.0%
95 81
 
1.9%
10 74
 
1.7%
60 72
 
1.7%
0 72
 
1.7%
Other values (31) 399
 
9.3%
(Missing) 2667
62.4%

Length

2023-02-28T14:18:39.908942image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
90 244
15.2%
100 225
14.0%
80 160
10.0%
70 99
 
6.2%
40 95
 
5.9%
20 85
 
5.3%
95 81
 
5.0%
10 74
 
4.6%
neg 72
 
4.5%
0 72
 
4.5%
Other values (29) 398
24.8%

Most occurring characters

ValueCountFrequency (%)
0 1471
43.9%
9 355
 
10.6%
1 341
 
10.2%
5 261
 
7.8%
8 194
 
5.8%
6 117
 
3.5%
7 113
 
3.4%
2 102
 
3.0%
4 100
 
3.0%
n 68
 
2.0%
Other values (15) 227
 
6.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 3119
93.1%
Lowercase Letter 210
 
6.3%
Uppercase Letter 18
 
0.5%
Dash Punctuation 1
 
< 0.1%
Other Punctuation 1
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1471
47.2%
9 355
 
11.4%
1 341
 
10.9%
5 261
 
8.4%
8 194
 
6.2%
6 117
 
3.8%
7 113
 
3.6%
2 102
 
3.3%
4 100
 
3.2%
3 65
 
2.1%
Lowercase Letter
ValueCountFrequency (%)
n 68
32.4%
e 66
31.4%
g 66
31.4%
i 2
 
1.0%
c 2
 
1.0%
o 2
 
1.0%
l 1
 
0.5%
u 1
 
0.5%
s 1
 
0.5%
v 1
 
0.5%
Uppercase Letter
ValueCountFrequency (%)
N 6
33.3%
E 6
33.3%
G 6
33.3%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%
Other Punctuation
ValueCountFrequency (%)
% 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 3121
93.2%
Latin 228
 
6.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 68
29.8%
e 66
28.9%
g 66
28.9%
N 6
 
2.6%
E 6
 
2.6%
G 6
 
2.6%
i 2
 
0.9%
c 2
 
0.9%
o 2
 
0.9%
l 1
 
0.4%
Other values (3) 3
 
1.3%
Common
ValueCountFrequency (%)
0 1471
47.1%
9 355
 
11.4%
1 341
 
10.9%
5 261
 
8.4%
8 194
 
6.2%
6 117
 
3.7%
7 113
 
3.6%
2 102
 
3.3%
4 100
 
3.2%
3 65
 
2.1%
Other values (2) 2
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3349
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1471
43.9%
9 355
 
10.6%
1 341
 
10.2%
5 261
 
7.8%
8 194
 
5.8%
6 117
 
3.5%
7 113
 
3.4%
2 102
 
3.0%
4 100
 
3.0%
n 68
 
2.0%
Other values (15) 227
 
6.8%

receptor_de_progesterona_quantificacao_%_2
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct25
Distinct (%)8.8%
Missing3988
Missing (%)93.4%
Infinite0
Infinite (%)0.0%
Mean64.5
Minimum0
Maximum100
Zeros12
Zeros (%)0.3%
Negative0
Negative (%)0.0%
Memory size33.5 KiB
2023-02-28T14:18:40.246814image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q140
median80
Q395
95-th percentile100
Maximum100
Range100
Interquartile range (IQR)55

Descriptive statistics

Standard deviation34.241658
Coefficient of variation (CV)0.53087842
Kurtosis-1.0770396
Mean64.5
Median Absolute Deviation (MAD)20
Skewness-0.63093832
Sum18318
Variance1172.4912
MonotonicityNot monotonic
2023-02-28T14:18:40.596515image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=25)
ValueCountFrequency (%)
100 56
 
1.3%
80 40
 
0.9%
90 37
 
0.9%
40 22
 
0.5%
95 19
 
0.4%
10 15
 
0.4%
70 14
 
0.3%
50 14
 
0.3%
20 13
 
0.3%
0 12
 
0.3%
Other values (15) 42
 
1.0%
(Missing) 3988
93.4%
ValueCountFrequency (%)
0 12
0.3%
1 4
 
0.1%
2 1
 
< 0.1%
3 1
 
< 0.1%
5 6
 
0.1%
8 1
 
< 0.1%
10 15
0.4%
15 4
 
0.1%
20 13
0.3%
25 1
 
< 0.1%
ValueCountFrequency (%)
100 56
1.3%
99 1
 
< 0.1%
98 1
 
< 0.1%
95 19
 
0.4%
90 37
0.9%
85 1
 
< 0.1%
80 40
0.9%
70 14
 
0.3%
66 1
 
< 0.1%
60 11
 
0.3%

receptorde_estrogenio_quantificacao_%_1
Categorical

HIGH CORRELATION  MISSING 

Distinct39
Distinct (%)2.1%
Missing2449
Missing (%)57.3%
Memory size33.5 KiB
100
580 
90
393 
95
197 
80
128 
0
68 
Other values (34)
457 

Length

Max length4
Median length2
Mean length2.2945694
Min length1

Characters and Unicode

Total characters4183
Distinct characters17
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique15 ?
Unique (%)0.8%

Sample

1st row60
2nd row90
3rd row40
4th row50
5th row10

Common Values

ValueCountFrequency (%)
100 580
 
13.6%
90 393
 
9.2%
95 197
 
4.6%
80 128
 
3.0%
0 68
 
1.6%
70 61
 
1.4%
neg 53
 
1.2%
98 40
 
0.9%
60 39
 
0.9%
40 34
 
0.8%
Other values (29) 230
 
5.4%
(Missing) 2449
57.3%

Length

2023-02-28T14:18:40.963027image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
100 580
31.8%
90 393
21.6%
95 197
 
10.8%
80 128
 
7.0%
0 68
 
3.7%
70 61
 
3.3%
neg 57
 
3.1%
98 40
 
2.2%
60 39
 
2.1%
10 34
 
1.9%
Other values (28) 226
 
12.4%

Most occurring characters

ValueCountFrequency (%)
0 1990
47.6%
9 694
 
16.6%
1 634
 
15.2%
5 260
 
6.2%
8 177
 
4.2%
6 93
 
2.2%
7 71
 
1.7%
g 53
 
1.3%
e 53
 
1.3%
n 53
 
1.3%
Other values (7) 105
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 4011
95.9%
Lowercase Letter 159
 
3.8%
Uppercase Letter 12
 
0.3%
Dash Punctuation 1
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1990
49.6%
9 694
 
17.3%
1 634
 
15.8%
5 260
 
6.5%
8 177
 
4.4%
6 93
 
2.3%
7 71
 
1.8%
2 37
 
0.9%
4 35
 
0.9%
3 20
 
0.5%
Lowercase Letter
ValueCountFrequency (%)
g 53
33.3%
e 53
33.3%
n 53
33.3%
Uppercase Letter
ValueCountFrequency (%)
N 4
33.3%
E 4
33.3%
G 4
33.3%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 4012
95.9%
Latin 171
 
4.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1990
49.6%
9 694
 
17.3%
1 634
 
15.8%
5 260
 
6.5%
8 177
 
4.4%
6 93
 
2.3%
7 71
 
1.8%
2 37
 
0.9%
4 35
 
0.9%
3 20
 
0.5%
Latin
ValueCountFrequency (%)
g 53
31.0%
e 53
31.0%
n 53
31.0%
N 4
 
2.3%
E 4
 
2.3%
G 4
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4183
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1990
47.6%
9 694
 
16.6%
1 634
 
15.2%
5 260
 
6.2%
8 177
 
4.2%
6 93
 
2.2%
7 71
 
1.7%
g 53
 
1.3%
e 53
 
1.3%
n 53
 
1.3%
Other values (7) 105
 
2.5%

receptorde_estrogenio_quantificacao_%_2
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct24
Distinct (%)7.1%
Missing3932
Missing (%)92.0%
Infinite0
Infinite (%)0.0%
Mean83.108824
Minimum0
Maximum100
Zeros11
Zeros (%)0.3%
Negative0
Negative (%)0.0%
Memory size33.5 KiB
2023-02-28T14:18:41.258018image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile10
Q180
median95
Q3100
95-th percentile100
Maximum100
Range100
Interquartile range (IQR)20

Descriptive statistics

Standard deviation27.550489
Coefficient of variation (CV)0.33149896
Kurtosis2.6084117
Mean83.108824
Median Absolute Deviation (MAD)5
Skewness-1.948701
Sum28257
Variance759.02942
MonotonicityNot monotonic
2023-02-28T14:18:41.602612image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
100 135
 
3.2%
90 69
 
1.6%
95 33
 
0.8%
80 21
 
0.5%
0 11
 
0.3%
70 10
 
0.2%
99 8
 
0.2%
20 8
 
0.2%
60 7
 
0.2%
50 6
 
0.1%
Other values (14) 32
 
0.7%
(Missing) 3932
92.0%
ValueCountFrequency (%)
0 11
0.3%
1 1
 
< 0.1%
4 1
 
< 0.1%
9 2
 
< 0.1%
10 6
0.1%
12 1
 
< 0.1%
20 8
0.2%
25 1
 
< 0.1%
30 5
0.1%
40 4
 
0.1%
ValueCountFrequency (%)
100 135
3.2%
99 8
 
0.2%
98 5
 
0.1%
97 1
 
< 0.1%
95 33
 
0.8%
90 69
1.6%
85 1
 
< 0.1%
80 21
 
0.5%
70 10
 
0.2%
67 1
 
< 0.1%

indice_h_receptorde_progesterona_1
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct35
Distinct (%)7.6%
Missing3811
Missing (%)89.2%
Infinite0
Infinite (%)0.0%
Mean195.70933
Minimum0
Maximum300
Zeros3
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size33.5 KiB
2023-02-28T14:18:41.904627image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile15
Q1120
median240
Q3270
95-th percentile300
Maximum300
Range300
Interquartile range (IQR)150

Descriptive statistics

Standard deviation98.425728
Coefficient of variation (CV)0.50291792
Kurtosis-1.0237731
Mean195.70933
Median Absolute Deviation (MAD)60
Skewness-0.65518036
Sum90222
Variance9687.624
MonotonicityNot monotonic
2023-02-28T14:18:42.262378image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=35)
ValueCountFrequency (%)
300 73
 
1.7%
270 72
 
1.7%
240 61
 
1.4%
120 33
 
0.8%
285 33
 
0.8%
210 25
 
0.6%
60 20
 
0.5%
180 17
 
0.4%
150 13
 
0.3%
90 13
 
0.3%
Other values (25) 101
 
2.4%
(Missing) 3811
89.2%
ValueCountFrequency (%)
0 3
 
0.1%
1 2
 
< 0.1%
2 1
 
< 0.1%
4 1
 
< 0.1%
5 2
 
< 0.1%
10 10
0.2%
15 11
0.3%
20 9
0.2%
30 9
0.2%
40 10
0.2%
ValueCountFrequency (%)
300 73
1.7%
294 6
 
0.1%
285 33
0.8%
282 1
 
< 0.1%
270 72
1.7%
255 3
 
0.1%
240 61
1.4%
225 4
 
0.1%
210 25
 
0.6%
200 2
 
< 0.1%

indice_h_receptorde_progesterona_2
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct25
Distinct (%)19.1%
Missing4141
Missing (%)96.9%
Infinite0
Infinite (%)0.0%
Mean201.94656
Minimum1
Maximum300
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.5 KiB
2023-02-28T14:18:42.519073image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile12.5
Q1120
median240
Q3289.5
95-th percentile300
Maximum300
Range299
Interquartile range (IQR)169.5

Descriptive statistics

Standard deviation98.918872
Coefficient of variation (CV)0.48982696
Kurtosis-1.0003048
Mean201.94656
Median Absolute Deviation (MAD)60
Skewness-0.67250107
Sum26455
Variance9784.9433
MonotonicityNot monotonic
2023-02-28T14:18:42.715711image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=25)
ValueCountFrequency (%)
300 32
 
0.7%
270 19
 
0.4%
240 16
 
0.4%
120 12
 
0.3%
210 7
 
0.2%
285 6
 
0.1%
150 6
 
0.1%
90 5
 
0.1%
10 4
 
0.1%
60 4
 
0.1%
Other values (15) 20
 
0.5%
(Missing) 4141
96.9%
ValueCountFrequency (%)
1 1
 
< 0.1%
4 1
 
< 0.1%
6 1
 
< 0.1%
10 4
0.1%
15 2
< 0.1%
20 1
 
< 0.1%
40 2
< 0.1%
45 4
0.1%
60 4
0.1%
80 1
 
< 0.1%
ValueCountFrequency (%)
300 32
0.7%
294 1
 
< 0.1%
285 6
 
0.1%
270 19
0.4%
255 1
 
< 0.1%
240 16
0.4%
210 7
 
0.2%
180 1
 
< 0.1%
160 1
 
< 0.1%
150 6
 
0.1%

her2_por_ihc_1
Categorical

HIGH CORRELATION  MISSING 

Distinct5
Distinct (%)0.1%
Missing53
Missing (%)1.2%
Memory size33.5 KiB
0 (negativo)
2617 
+++ (positivo)
1224 
++ (duvidoso)
 
247
+ (negativo)
 
110
indeterminado
 
21

Length

Max length15
Median length12
Mean length13.018488
Min length12

Characters and Unicode

Total characters54925
Distinct characters19
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row +++ (positivo)
2nd row0 (negativo)
3rd row +++ (positivo)
4th row0 (negativo)
5th row +++ (positivo)

Common Values

ValueCountFrequency (%)
0 (negativo) 2617
61.3%
+++ (positivo) 1224
28.7%
++ (duvidoso) 247
 
5.8%
+ (negativo) 110
 
2.6%
indeterminado 21
 
0.5%
(Missing) 53
 
1.2%

Length

2023-02-28T14:18:42.928848image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:18:43.139738image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
negativo 2727
32.4%
0 2617
31.1%
1581
18.8%
positivo 1224
14.5%
duvidoso 247
 
2.9%
indeterminado 21
 
0.2%

Most occurring characters

ValueCountFrequency (%)
5779
10.5%
o 5690
10.4%
i 5464
9.9%
+ 4276
 
7.8%
v 4198
 
7.6%
( 4198
 
7.6%
) 4198
 
7.6%
t 3972
 
7.2%
n 2769
 
5.0%
e 2769
 
5.0%
Other values (9) 11612
21.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 33857
61.6%
Space Separator 5779
 
10.5%
Math Symbol 4276
 
7.8%
Open Punctuation 4198
 
7.6%
Close Punctuation 4198
 
7.6%
Decimal Number 2617
 
4.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 5690
16.8%
i 5464
16.1%
v 4198
12.4%
t 3972
11.7%
n 2769
8.2%
e 2769
8.2%
a 2748
8.1%
g 2727
8.1%
s 1471
 
4.3%
p 1224
 
3.6%
Other values (4) 825
 
2.4%
Space Separator
ValueCountFrequency (%)
5779
100.0%
Math Symbol
ValueCountFrequency (%)
+ 4276
100.0%
Open Punctuation
ValueCountFrequency (%)
( 4198
100.0%
Close Punctuation
ValueCountFrequency (%)
) 4198
100.0%
Decimal Number
ValueCountFrequency (%)
0 2617
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 33857
61.6%
Common 21068
38.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 5690
16.8%
i 5464
16.1%
v 4198
12.4%
t 3972
11.7%
n 2769
8.2%
e 2769
8.2%
a 2748
8.1%
g 2727
8.1%
s 1471
 
4.3%
p 1224
 
3.6%
Other values (4) 825
 
2.4%
Common
ValueCountFrequency (%)
5779
27.4%
+ 4276
20.3%
( 4198
19.9%
) 4198
19.9%
0 2617
12.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 54925
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5779
10.5%
o 5690
10.4%
i 5464
9.9%
+ 4276
 
7.8%
v 4198
 
7.6%
( 4198
 
7.6%
) 4198
 
7.6%
t 3972
 
7.2%
n 2769
 
5.0%
e 2769
 
5.0%
Other values (9) 11612
21.1%

her2_por_ihc_2
Categorical

HIGH CORRELATION  MISSING 

Distinct5
Distinct (%)1.2%
Missing3847
Missing (%)90.1%
Memory size33.5 KiB
0 (negativo)
300 
+++ (positivo)
67 
++ (duvidoso)
31 
+ (negativo)
 
26
indeterminado
 
1

Length

Max length15
Median length12
Mean length12.682353
Min length12

Characters and Unicode

Total characters5390
Distinct characters19
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.2%

Sample

1st row +++ (positivo)
2nd row +++ (positivo)
3rd row +++ (positivo)
4th row + (negativo)
5th row0 (negativo)

Common Values

ValueCountFrequency (%)
0 (negativo) 300
 
7.0%
+++ (positivo) 67
 
1.6%
++ (duvidoso) 31
 
0.7%
+ (negativo) 26
 
0.6%
indeterminado 1
 
< 0.1%
(Missing) 3847
90.1%

Length

2023-02-28T14:18:43.324348image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:18:43.533170image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
negativo 326
38.4%
0 300
35.3%
124
 
14.6%
positivo 67
 
7.9%
duvidoso 31
 
3.7%
indeterminado 1
 
0.1%

Most occurring characters

ValueCountFrequency (%)
548
10.2%
o 523
9.7%
i 493
9.1%
v 424
 
7.9%
) 424
 
7.9%
( 424
 
7.9%
t 394
 
7.3%
n 328
 
6.1%
e 328
 
6.1%
a 327
 
6.1%
Other values (9) 1177
21.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3405
63.2%
Space Separator 548
 
10.2%
Close Punctuation 424
 
7.9%
Open Punctuation 424
 
7.9%
Decimal Number 300
 
5.6%
Math Symbol 289
 
5.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 523
15.4%
i 493
14.5%
v 424
12.5%
t 394
11.6%
n 328
9.6%
e 328
9.6%
a 327
9.6%
g 326
9.6%
s 98
 
2.9%
p 67
 
2.0%
Other values (4) 97
 
2.8%
Space Separator
ValueCountFrequency (%)
548
100.0%
Close Punctuation
ValueCountFrequency (%)
) 424
100.0%
Open Punctuation
ValueCountFrequency (%)
( 424
100.0%
Decimal Number
ValueCountFrequency (%)
0 300
100.0%
Math Symbol
ValueCountFrequency (%)
+ 289
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3405
63.2%
Common 1985
36.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 523
15.4%
i 493
14.5%
v 424
12.5%
t 394
11.6%
n 328
9.6%
e 328
9.6%
a 327
9.6%
g 326
9.6%
s 98
 
2.9%
p 67
 
2.0%
Other values (4) 97
 
2.8%
Common
ValueCountFrequency (%)
548
27.6%
) 424
21.4%
( 424
21.4%
0 300
15.1%
+ 289
14.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5390
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
548
10.2%
o 523
9.7%
i 493
9.1%
v 424
 
7.9%
) 424
 
7.9%
( 424
 
7.9%
t 394
 
7.3%
n 328
 
6.1%
e 328
 
6.1%
a 327
 
6.1%
Other values (9) 1177
21.8%

her2_por_fish_1
Categorical

HIGH CORRELATION  MISSING 

Distinct5
Distinct (%)0.3%
Missing2533
Missing (%)59.3%
Memory size33.5 KiB
não realizado
1155 
amplificado
375 
sem amplificação
184 
duvidoso
 
20
reação não funcionou
 
5

Length

Max length20
Median length13
Mean length12.848764
Min length8

Characters and Unicode

Total characters22344
Distinct characters19
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rownão realizado
2nd rownão realizado
3rd rownão realizado
4th rowamplificado
5th rowsem amplificação

Common Values

ValueCountFrequency (%)
não realizado 1155
27.0%
amplificado 375
 
8.8%
sem amplificação 184
 
4.3%
duvidoso 20
 
0.5%
reação não funcionou 5
 
0.1%
(Missing) 2533
59.3%

Length

2023-02-28T14:18:43.709355image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:18:43.918325image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
não 1160
37.6%
realizado 1155
37.4%
amplificado 375
 
12.1%
sem 184
 
6.0%
amplificação 184
 
6.0%
duvidoso 20
 
0.6%
reação 5
 
0.2%
funcionou 5
 
0.2%

Most occurring characters

ValueCountFrequency (%)
a 3433
15.4%
o 2929
13.1%
i 2298
10.3%
l 1714
7.7%
d 1570
 
7.0%
1349
 
6.0%
ã 1349
 
6.0%
e 1344
 
6.0%
n 1170
 
5.2%
r 1160
 
5.2%
Other values (9) 4028
18.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 20995
94.0%
Space Separator 1349
 
6.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 3433
16.4%
o 2929
14.0%
i 2298
10.9%
l 1714
8.2%
d 1570
7.5%
ã 1349
 
6.4%
e 1344
 
6.4%
n 1170
 
5.6%
r 1160
 
5.5%
z 1155
 
5.5%
Other values (8) 2873
13.7%
Space Separator
ValueCountFrequency (%)
1349
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 20995
94.0%
Common 1349
 
6.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 3433
16.4%
o 2929
14.0%
i 2298
10.9%
l 1714
8.2%
d 1570
7.5%
ã 1349
 
6.4%
e 1344
 
6.4%
n 1170
 
5.6%
r 1160
 
5.5%
z 1155
 
5.5%
Other values (8) 2873
13.7%
Common
ValueCountFrequency (%)
1349
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 20806
93.1%
None 1538
 
6.9%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 3433
16.5%
o 2929
14.1%
i 2298
11.0%
l 1714
8.2%
d 1570
7.5%
1349
 
6.5%
e 1344
 
6.5%
n 1170
 
5.6%
r 1160
 
5.6%
z 1155
 
5.6%
Other values (7) 2684
12.9%
None
ValueCountFrequency (%)
ã 1349
87.7%
ç 189
 
12.3%

her2_por_fish_2
Categorical

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct5
Distinct (%)1.7%
Missing3969
Missing (%)92.9%
Memory size33.5 KiB
não realizado
221 
amplificado
62 
sem amplificação
 
18
reação não funcionou
 
1
duvidoso
 
1

Length

Max length20
Median length13
Mean length12.775578
Min length8

Characters and Unicode

Total characters3871
Distinct characters19
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)0.7%

Sample

1st rownão realizado
2nd rownão realizado
3rd rownão realizado
4th rownão realizado
5th rowsem amplificação

Common Values

ValueCountFrequency (%)
não realizado 221
 
5.2%
amplificado 62
 
1.5%
sem amplificação 18
 
0.4%
reação não funcionou 1
 
< 0.1%
duvidoso 1
 
< 0.1%
(Missing) 3969
92.9%

Length

2023-02-28T14:18:44.098461image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-02-28T14:18:44.308525image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
não 222
40.8%
realizado 221
40.6%
amplificado 62
 
11.4%
sem 18
 
3.3%
amplificação 18
 
3.3%
reação 1
 
0.2%
funcionou 1
 
0.2%
duvidoso 1
 
0.2%

Most occurring characters

ValueCountFrequency (%)
a 603
15.6%
o 528
13.6%
i 383
9.9%
l 301
7.8%
d 285
7.4%
241
 
6.2%
ã 241
 
6.2%
e 240
 
6.2%
n 224
 
5.8%
r 222
 
5.7%
Other values (9) 603
15.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3630
93.8%
Space Separator 241
 
6.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 603
16.6%
o 528
14.5%
i 383
10.6%
l 301
8.3%
d 285
7.9%
ã 241
 
6.6%
e 240
 
6.6%
n 224
 
6.2%
r 222
 
6.1%
z 221
 
6.1%
Other values (8) 382
10.5%
Space Separator
ValueCountFrequency (%)
241
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3630
93.8%
Common 241
 
6.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 603
16.6%
o 528
14.5%
i 383
10.6%
l 301
8.3%
d 285
7.9%
ã 241
 
6.6%
e 240
 
6.6%
n 224
 
6.2%
r 222
 
6.1%
z 221
 
6.1%
Other values (8) 382
10.5%
Common
ValueCountFrequency (%)
241
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3611
93.3%
None 260
 
6.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 603
16.7%
o 528
14.6%
i 383
10.6%
l 301
8.3%
d 285
7.9%
241
 
6.7%
e 240
 
6.6%
n 224
 
6.2%
r 222
 
6.1%
z 221
 
6.1%
Other values (7) 363
10.1%
None
ValueCountFrequency (%)
ã 241
92.7%
ç 19
 
7.3%

ki67_%_1
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct67
Distinct (%)2.0%
Missing899
Missing (%)21.0%
Infinite0
Infinite (%)0.0%
Mean36.777942
Minimum0
Maximum100
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size33.5 KiB
2023-02-28T14:18:44.484635image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile5
Q118
median30
Q350
95-th percentile85
Maximum100
Range100
Interquartile range (IQR)32

Descriptive statistics

Standard deviation24.745278
Coefficient of variation (CV)0.67282932
Kurtosis-0.5643283
Mean36.777942
Median Absolute Deviation (MAD)15
Skewness0.73176733
Sum124052
Variance612.32879
MonotonicityNot monotonic
2023-02-28T14:18:44.706766image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30 435
10.2%
20 372
8.7%
40 330
 
7.7%
10 280
 
6.6%
15 263
 
6.2%
80 208
 
4.9%
70 200
 
4.7%
50 183
 
4.3%
60 182
 
4.3%
25 161
 
3.8%
Other values (57) 759
17.8%
(Missing) 899
21.0%
ValueCountFrequency (%)
0 1
 
< 0.1%
1 13
 
0.3%
2 16
 
0.4%
3 9
 
0.2%
4 5
 
0.1%
5 139
3.3%
6 4
 
0.1%
7 4
 
0.1%
8 18
 
0.4%
9 4
 
0.1%
ValueCountFrequency (%)
100 1
 
< 0.1%
99 1
 
< 0.1%
95 28
 
0.7%
90 132
3.1%
87 1
 
< 0.1%
85 17
 
0.4%
80 208
4.9%
75 10
 
0.2%
73 4
 
0.1%
70 200
4.7%

ki67_%_2
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct38
Distinct (%)8.1%
Missing3800
Missing (%)89.0%
Infinite0
Infinite (%)0.0%
Mean34.28178
Minimum2
Maximum95
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size33.5 KiB
2023-02-28T14:18:44.915284image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile9.55
Q118
median30
Q345
95-th percentile80
Maximum95
Range93
Interquartile range (IQR)27

Descriptive statistics

Standard deviation22.54147
Coefficient of variation (CV)0.65753501
Kurtosis0.010056229
Mean34.28178
Median Absolute Deviation (MAD)12
Skewness0.91377781
Sum16181
Variance508.11788
MonotonicityNot monotonic
2023-02-28T14:18:45.114972image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=38)
ValueCountFrequency (%)
30 67
 
1.6%
20 62
 
1.5%
40 53
 
1.2%
10 49
 
1.1%
15 31
 
0.7%
60 26
 
0.6%
70 25
 
0.6%
50 22
 
0.5%
25 20
 
0.5%
35 16
 
0.4%
Other values (28) 101
 
2.4%
(Missing) 3800
89.0%
ValueCountFrequency (%)
2 4
 
0.1%
4 1
 
< 0.1%
5 13
 
0.3%
6 3
 
0.1%
8 2
 
< 0.1%
9 1
 
< 0.1%
10 49
1.1%
12 4
 
0.1%
15 31
0.7%
16 1
 
< 0.1%
ValueCountFrequency (%)
95 3
 
0.1%
90 14
0.3%
87 1
 
< 0.1%
85 1
 
< 0.1%
80 16
0.4%
75 1
 
< 0.1%
70 25
0.6%
65 3
 
0.1%
60 26
0.6%
55 3
 
0.1%

Interactions

2023-02-28T14:18:26.818457image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:02.756174image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:07.501656image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:11.747928image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:15.061128image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:17.663322image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:21.902491image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:27.169847image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:03.252200image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:08.157716image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:12.187712image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:15.603599image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:17.976339image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:22.737033image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:27.546500image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:04.043890image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:09.333491image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:12.725058image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:15.956062image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:19.255837image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:23.721087image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:28.018921image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:04.816883image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:10.048301image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:13.240973image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:16.259045image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:19.544409image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:24.701538image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:28.420366image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:05.562463image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:10.480160image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:13.730651image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:16.577154image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:19.949494image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:25.300791image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:28.705949image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:05.978770image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:10.860128image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:14.157425image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:17.002388image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:20.519025image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:25.722448image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:29.002634image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:06.857270image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:11.312359image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:14.663101image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:17.324877image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:21.166905image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-02-28T14:18:26.469054image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Correlations

2023-02-28T14:18:45.341381image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
record_idreceptor_de_progesterona_quantificacao_%_2receptorde_estrogenio_quantificacao_%_2indice_h_receptorde_progesterona_1indice_h_receptorde_progesterona_2ki67_%_1ki67_%_2repeat_instance_1diagnostico_primario_tipo_histologico_1diagnostico_primario_tipo_histologico_2grau_histologico_1grau_histologico_2subtipo_tumoral_1subtipo_tumoral_2receptor_de_estrogenio_1receptor_de_estrogenio_2receptor_de_progesterona_1receptor_de_progesterona_2ki67_>14%_1ki67_>14%_2receptor_de_progesterona_quantificacao_%_1receptorde_estrogenio_quantificacao_%_1her2_por_ihc_1her2_por_ihc_2her2_por_fish_1her2_por_fish_2
record_id1.0000.032-0.0040.079-0.1430.047-0.0990.0800.0390.3210.2650.1530.1900.3010.2290.3560.1610.2280.0720.0930.2030.2340.1430.1440.2240.252
receptor_de_progesterona_quantificacao_%_20.0321.0000.3900.087-0.025-0.109-0.0981.0000.2960.0000.2040.1470.4220.5420.6930.8380.4240.8800.0980.0690.7280.4540.1450.1370.1010.151
receptorde_estrogenio_quantificacao_%_2-0.0040.3901.000-0.166-0.145-0.0280.0201.0000.2610.0000.2660.2550.4640.5800.7150.9290.3050.3520.2500.0720.4480.7060.1410.0910.0000.226
indice_h_receptorde_progesterona_10.0790.087-0.1661.0000.934-0.200-0.1681.0000.0151.0000.1590.2540.1160.0600.0800.0000.1070.0000.1450.1840.8330.1090.0270.0000.0000.000
indice_h_receptorde_progesterona_2-0.143-0.025-0.1450.9341.000-0.106-0.0931.0001.0001.0000.1930.1720.0000.2170.0001.0000.1281.0000.1300.1670.7900.1000.2010.1630.0000.143
ki67_%_10.047-0.109-0.028-0.200-0.1061.0000.7681.0000.0780.1500.4100.4620.4800.3770.5130.3370.3220.2310.6210.4040.1510.1900.1120.0570.1030.193
ki67_%_2-0.099-0.0980.020-0.168-0.0930.7681.0001.0000.0000.0740.4450.4760.3560.5090.4530.5030.2470.2830.3600.6540.1790.2130.1460.0480.1190.145
repeat_instance_10.0801.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0000.0191.0001.0001.0001.0001.0001.0001.0001.0001.0000.0001.0001.0001.000
diagnostico_primario_tipo_histologico_10.0390.2960.2610.0151.0000.0780.0001.0001.0000.7720.1040.0000.1420.1260.2670.1610.1650.0000.0000.0000.1780.0800.0390.0000.0000.100
diagnostico_primario_tipo_histologico_20.3210.0000.0001.0001.0000.1500.0741.0000.7721.0000.0000.0640.1120.1140.2890.1880.1290.0790.0000.0000.0000.0000.0000.0000.0000.000
grau_histologico_10.2650.2040.2660.1590.1930.4100.4451.0000.1040.0001.0000.8260.4330.4470.4900.3070.3160.2050.3020.4830.2740.3050.0710.1130.0600.079
grau_histologico_20.1530.1470.2550.2540.1720.4620.4761.0000.0000.0640.8261.0000.4310.4310.3400.3370.1670.1700.5060.4930.2810.3340.1710.1850.1440.141
subtipo_tumoral_10.1900.4220.4640.1160.0000.4800.3560.0190.1420.1120.4330.4311.0000.7230.9820.6360.5850.3750.5360.4320.4580.4990.4850.3540.4770.411
subtipo_tumoral_20.3010.5420.5800.0600.2170.3770.5091.0000.1260.1140.4470.4310.7231.0000.9070.9740.4700.5050.4490.9490.4840.5060.4500.4720.3970.479
receptor_de_estrogenio_10.2290.6930.7150.0800.0000.5130.4531.0000.2670.2890.4900.3400.9820.9071.0000.8980.7950.6210.2560.2250.8160.9440.0940.0680.2800.176
receptor_de_estrogenio_20.3560.8380.9290.0001.0000.3370.5031.0000.1610.1880.3070.3370.6360.9740.8981.0000.4510.8490.1490.2080.5490.9160.0000.0560.0420.000
receptor_de_progesterona_10.1610.4240.3050.1070.1280.3220.2471.0000.1650.1290.3160.1670.5850.4700.7950.4511.0000.6330.1690.1150.9300.4010.0650.0000.1690.000
receptor_de_progesterona_20.2280.8800.3520.0001.0000.2310.2831.0000.0000.0790.2050.1700.3750.5050.6210.8490.6331.0000.2520.1030.3780.6450.0000.1640.1150.122
ki67_>14%_10.0720.0980.2500.1450.1300.6210.3601.0000.0000.0000.3020.5060.5360.4490.2560.1490.1690.2521.0000.4320.1670.0570.0800.1090.0470.130
ki67_>14%_20.0930.0690.0720.1840.1670.4040.6541.0000.0000.0000.4830.4930.4320.9490.2250.2080.1150.1030.4321.0000.1330.0000.0170.1060.0650.057
receptor_de_progesterona_quantificacao_%_10.2030.7280.4480.8330.7900.1510.1791.0000.1780.0000.2740.2810.4580.4840.8160.5490.9300.3780.1670.1331.0000.3920.1220.0000.1560.076
receptorde_estrogenio_quantificacao_%_10.2340.4540.7060.1090.1000.1900.2131.0000.0800.0000.3050.3340.4990.5060.9440.9160.4010.6450.0570.0000.3921.0000.1480.0260.1640.326
her2_por_ihc_10.1430.1450.1410.0270.2010.1120.1460.0000.0390.0000.0710.1710.4850.4500.0940.0000.0650.0000.0800.0170.1220.1481.0000.4860.5280.500
her2_por_ihc_20.1440.1370.0910.0000.1630.0570.0481.0000.0000.0000.1130.1850.3540.4720.0680.0560.0000.1640.1090.1060.0000.0260.4861.0000.4460.613
her2_por_fish_10.2240.1010.0000.0000.0000.1030.1191.0000.0000.0000.0600.1440.4770.3970.2800.0420.1690.1150.0470.0650.1560.1640.5280.4461.0000.536
her2_por_fish_20.2520.1510.2260.0000.1430.1930.1451.0000.1000.0000.0790.1410.4110.4790.1760.0000.0000.1220.1300.0570.0760.3260.5000.6130.5361.000

Missing values

2023-02-28T14:18:29.543252image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
A simple visualization of nullity by column.
2023-02-28T14:18:30.795402image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-02-28T14:18:31.527911image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

record_idrepeat_instrument_1repeat_instrument_2repeat_instance_1repeat_instance_2diagnostico_primario_tipo_histologico_1diagnostico_primario_tipo_histologico_2grau_histologico_1grau_histologico_2subtipo_tumoral_1subtipo_tumoral_2receptor_de_estrogenio_1receptor_de_estrogenio_2receptor_de_progesterona_1receptor_de_progesterona_2ki67_>14%_1ki67_>14%_2receptor_de_progesterona_quantificacao_%_1receptor_de_progesterona_quantificacao_%_2receptorde_estrogenio_quantificacao_%_1receptorde_estrogenio_quantificacao_%_2indice_h_receptorde_progesterona_1indice_h_receptorde_progesterona_2her2_por_ihc_1her2_por_ihc_2her2_por_fish_1her2_por_fish_2ki67_%_1ki67_%_2
0302Dados Histopatologicos MamaNaN1.0NaNNaNNaNNaNNaN5.0NaNpositivoNaNpositivoNaNNaNNaNNaNNaNNaNNaNNaNNaN+++ (positivo)NaNNaNNaNNaNNaN
1710Dados Histopatologicos MamaNaN1.0NaNNaNNaNNaNNaN2.0NaNpositivoNaNnegativoNaNpositivoNaNNaNNaNNaNNaNNaNNaN0 (negativo)NaNNaNNaN20.0NaN
2752Dados Histopatologicos MamaNaN1.0NaNNaNNaNNaNNaN3.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN+++ (positivo)NaNNaNNaNNaNNaN
31367Dados Histopatologicos MamaDados Histopatologicos Mama1.02.0NaNNaNNaNNaN2.05.0positivopositivopositivonegativopositivopositivo5NaN603010.0NaN0 (negativo)+++ (positivo)NaNNaN40.040.0
41589Dados Histopatologicos MamaNaN1.0NaNNÃO-ESPECIAL - Carcinoma de mama ductal invasivo (CDI)/SOENaNNaNNaN5.0NaNpositivoNaNpositivoNaNNaNNaNNaNNaNNaNNaNNaNNaN+++ (positivo)NaNNaNNaNNaNNaN
51705Dados Histopatologicos MamaNaN1.0NaNNÃO-ESPECIAL - Carcinoma de mama ductal invasivo (CDI)/SOENaNNaNNaN4.0NaNnegativoNaNnegativoNaNpositivoNaNNaNNaNNaNNaNNaNNaN0 (negativo)NaNNaNNaNNaNNaN
61843Dados Histopatologicos MamaNaN1.0NaNNaNNaNNaNNaN3.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN+++ (positivo)NaNNaNNaNNaNNaN
71873Dados Histopatologicos MamaNaN1.0NaNNaNNaNNaNNaN5.0NaNpositivoNaNpositivoNaNnegativoNaN90NaN90NaNNaNNaN+++ (positivo)NaNnão realizadoNaN10.0NaN
81898Dados Histopatologicos MamaNaN1.0NaNNaNNaNNaNNaN3.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN+++ (positivo)NaNNaNNaNNaNNaN
91960Dados Histopatologicos MamaNaN1.0NaNNaNNaNNaNNaN3.0NaNnegativoNaNnegativoNaNNaNNaNNaNNaNNaNNaNNaNNaN+++ (positivo)NaNNaNNaNNaNNaN
record_idrepeat_instrument_1repeat_instrument_2repeat_instance_1repeat_instance_2diagnostico_primario_tipo_histologico_1diagnostico_primario_tipo_histologico_2grau_histologico_1grau_histologico_2subtipo_tumoral_1subtipo_tumoral_2receptor_de_estrogenio_1receptor_de_estrogenio_2receptor_de_progesterona_1receptor_de_progesterona_2ki67_>14%_1ki67_>14%_2receptor_de_progesterona_quantificacao_%_1receptor_de_progesterona_quantificacao_%_2receptorde_estrogenio_quantificacao_%_1receptorde_estrogenio_quantificacao_%_2indice_h_receptorde_progesterona_1indice_h_receptorde_progesterona_2her2_por_ihc_1her2_por_ihc_2her2_por_fish_1her2_por_fish_2ki67_%_1ki67_%_2
426282100Dados Histopatologicos MamaNaN1.0NaNNaNNaNNaNNaN3.0NaNNaNNaNNaNNaNpositivoNaNNaNNaNNaNNaNNaNNaN+++ (positivo)NaNNaNNaN50.0NaN
426382111Dados Histopatologicos MamaNaN1.0NaNNaNNaNNaNNaN3.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN+++ (positivo)NaNNaNNaNNaNNaN
426482112Dados Histopatologicos MamaNaN1.0NaNNaNNaNNaNNaN3.0NaNNaNNaNNaNNaNpositivoNaNNaNNaNNaNNaNNaNNaN+++ (positivo)NaNNaNNaN90.0NaN
426582118Dados Histopatologicos MamaNaN1.0NaNNaNNaNNaNNaN3.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN+++ (positivo)NaNNaNNaNNaNNaN
426682122Dados Histopatologicos MamaDados Histopatologicos Mama1.02.0NÃO-ESPECIAL - Carcinoma de mama ductal invasivo (CDI)/SOENÃO-ESPECIAL - Carcinoma de mama ductal invasivo (CDI)/SOENaNNaN2.05.0positivopositivopositivopositivopositivopositivo60108080NaNNaN++ (duvidoso)+++ (positivo)não realizadoNaN40.060.0
426782123Dados Histopatologicos MamaDados Histopatologicos Mama1.02.0NaNNaNNaNNaN3.03.0NaNNaNNaNNaNpositivoNaNNaNNaNNaNNaNNaNNaN+++ (positivo)+++ (positivo)NaNNaN20.0NaN
426882124Dados Histopatologicos MamaNaN1.0NaNNaNNaNNaNNaN3.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN+++ (positivo)NaNNaNNaNNaNNaN
426982131Dados Histopatologicos MamaNaN1.0NaNNaNNaNNaNNaN3.0NaNNaNNaNNaNNaNpositivoNaNNaNNaNNaNNaNNaNNaN+++ (positivo)NaNNaNNaN50.0NaN
427082205Dados Histopatologicos MamaNaN1.0NaNNaNNaNNaNNaN4.0NaNnegativoNaNnegativoNaNpositivoNaNNaNNaNNaNNaNNaNNaN0 (negativo)NaNNaNNaN90.0NaN
427182240Dados Histopatologicos MamaNaN1.0NaNNÃO-ESPECIAL - Carcinoma de mama ductal invasivo (CDI)/SOENaNNaNNaN4.0NaNnegativoNaNnegativoNaNNaNNaNNaNNaNNaNNaNNaNNaN0 (negativo)NaNNaNNaNNaNNaN